Picture for Xing Zhu

Xing Zhu

GA-VLN: Geometry-Aware BEV Representation for Efficient Vision-Language Navigation

Add code
May 21, 2026
Viaarxiv icon

CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives

Add code
May 12, 2026
Viaarxiv icon

Geometric Context Transformer for Streaming 3D Reconstruction

Add code
Apr 16, 2026
Viaarxiv icon

SceneScribe-1M: A Large-Scale Video Dataset with Comprehensive Geometric and Semantic Annotations

Add code
Apr 09, 2026
Viaarxiv icon

Causal World Modeling for Robot Control

Add code
Jan 29, 2026
Viaarxiv icon

Advancing Open-source World Models

Add code
Jan 28, 2026
Viaarxiv icon

A Pragmatic VLA Foundation Model

Add code
Jan 26, 2026
Viaarxiv icon

Masked Depth Modeling for Spatial Perception

Add code
Jan 25, 2026
Viaarxiv icon

PhysRVG: Physics-Aware Unified Reinforcement Learning for Video Generative Models

Add code
Jan 16, 2026
Viaarxiv icon

The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text

Add code
Dec 18, 2025
Figure 1 for The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text
Figure 2 for The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text
Figure 3 for The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text
Figure 4 for The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text
Viaarxiv icon